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RESPONSE TO RESTRICTION REQUIREMENT 

Applicant elects the invention of Group I, claims 1-3, 5, 7-17 and 21-2&o 
Applicant submits that in light of the amendment to claim 6 (making it depend from cfH 
claim 6 is also in Group I and should be examined. V] 
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REMARKS 

Applicant respectfully requests reconsideration and allowance of claims ^-3, 5-17 
and 21-25 which are pending in the above-identified application. Claim 6 has been amended. 

A substitute specification is submitted herewith in response to the Examiner's 

requirement. 

Applicant has elected the invention of Group I and submits that claims 1-3, 5-17, 
and 21-25 read on the elected invention. 

In light of the above, Applicant submits that the instant claims are in condition for 
allowance. Early and favorable action is earnestly solicited. 
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DATA PROCESSING CIRCUIT, MULTIPLIER UNIT WIT-fij 

rcD 

PIPELINE, ALU AND SHIFT REGISTER UNIT FOR USE° ^ 

IN A DATA PROCESSING CIRCUIT F ^ 1 
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BACKGROUND OF THE INVENTION : ™ S O 

o 

5 Nowadays a large number of personal computers make use of proSfessors 

which have a complex instruction set (CISC). Such processors are provid^ with a 
central processing unit, the function of which is adjusted at each clock pulse to^ .-*]-, 

C"t — ( ' 

perform the desired operation on two operand words. These processors are^uirently-q 
commercially available under Intel code numbers beginning with 80. r v j _ r pj 
10 Although the clock speed of the function adjustable processors has 

CD 

been increased considerably, the organizational structure of such a processor forms a 
great obstacle to further increase the processing speed. For instance, during 
multiplying and dividing of two operand words, frequent use must be made of 
registers internally present in the computer. 

15 So-called work stations often use a pipeline structure with a reduced 

instruction set (the so-called RISC, Reduced Instruction Set Computer) in order to 
increase the speed of the work station. This structure provides an increase in speed 
of so-called vector operations, wherein a large number of data words have to be 
subjected to the same arithmetic operation. Since a limited instruction set can be 

20 implemented efficiently, the execution of a large number of instructions requires 

only a single clock pulse. 

Although the RISC structure achieves an increase in speed for 
frequently occurring operations (such as multiplications) more complex instructions 
for particular operations are omitted from the instruction set and and, therefore, the 

25 speed of executing such operations is not increased. In addition, the processing unit 
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is often designed for data words with a fixed word length, for example 32 or 64 bits. 

In EP-A-0173383, a processor for floating point operations is 
disclosed. Such floating point operations are not useful for image or graphical 
processing applications, where operations have to be performed on integer data 
words of 8, 16 or 32 bits. 

In the article "The 1860TH 64-bit supercomputing microprocesser" by 
L. Kohn et al, published in the proceedings of supercomputing, 13-17 November 
1989, Reno, Nevada, VS, 1989, IEEE Computer Society Press, Washington D.C., a 
RISC based micro-processor for executing multiplications for either 64 bit or 32 bit 
words is described. As described above, such RISC concept does not provide for 
increased speed when integer data words of 8 bits or multiples thereof have to be 
processed. 

Also, in EP-A-0380100, a multiplier is disclosed for processing 32 bit 
operands to provide two 16 bit by 16 bit fixed point products for one 32 bit floating 
point product during each clock cycle. 

For image and/or graphics processing applications however, 
operations have to be performed on data words of 8 or 16 bits or a number of 
mutually associated bytes before even a limited speed increase is achieved in the 
RISC concept. 

The present invention provides a data processing circuit comprising: 

- a multiplier unit for multiplying integer data words of 8 bits or 
multiples thereof having a pipeline and in which the word length is adjustable for 
multiplying the integer data words; 

- an arithmetic logic unit (ALU) having an adjustable word length for 
performing arithmetic operations on integer data words of 8 bits or multiples thereof; 

- a register unit provided with at least two registej^aT^Toring the 
integer data words of 8 bits or multiples of8J^it8^5Tfw^h the operation and/or 
pipeline multiplication has tojje-peff6r5 mea; and 
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- a bus structure which comprises a number of separate buses and 
which effects the transport of integer data words from and to the multiplier unit, the 
arithmetic logic unit and the register unit. 

The data processing unit according to the present invention achieves a 
5 speed, for graphic applications, that is more than twice as great as in existing 

systems. In contrast to RISC and CISC, the data flow between the above specified 
circuits (multiplier, ALU etc.) is not fixed. Rather, the programmer is free to program 
the sequence of the data flow through the different units (free pipeline). 

The present invention further provides a multiplier unit with a 
10 pipeline for use in a data processing circuit. 

The present invention also comprises an arithmetic logic unit for use 
in a data processing circuit. 

Finally, the present invention provides a shift register unit for use in a data 
processing circuit. 

15 

BRIEF DESCRIPTION OF THE DRAWINGS 

^ SS\ S Ettft hu advantages, features and de t ails of the pfgggni 4fi v ^iitioii will be 
/Or if* / elucidated on the basis of the followingdeseripTionj of a preferred embodiment 
' ' ( thereof with reference tajjae^afinexed drawing, in which: fig. 1 shows a functional 
20 diagram of a^grapKic application of a data processing circuit according to the present 

Fig. 2 shows an outline diagram of the data processing circuit of fig. 

l; 

Fig. 3 shows a functional diagram of the internal structure of the data 
25 processing circuit of Fig. 1; 

Fig. 4 shows a first functional diagram of the arithmetic logic unit of 
the diagram of Fig. 3; 

Fig. 5 shows a second functional diagram of the arithmetic logic unit 
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of the diagram of Fig. 3; 

Fig. 6 shows a functional diagram of the multiplier unit with pipeline 
of the diagram of Fig. 3; 

Fig. 7 shows a functional diagram of a Wallace tree in the diagram of 

5 Fig. 6; and 

Fig. 8 shows a functional diagram of the shift register unit from the 
functional diagram of Fig. 3. 

DETAILED DESCRIPTION OF THE INVENTION : 

A data processing circuit 1 (fig. 1) according to the present 

10 invention, also named DISC or IMAGINE, is coupled via a bus 2 to a data memory 

3, for instance SRAM (Static Random Access Memory). The data processing circuit 
1 is further connected via a bus 4 to a main or video memory 5 for storage of image 
data, which is constructed from DRAM (Dynamic Random Access Memory) cells or 
is a (more expensive) VRAM. This main memory 5 drives a RAMDAC (Random 

15 Access Memory for a Digital Analog Converter) 7 via bus 6, which in turn provides 

a monitor (not shown) with the color signals R (red), G (green) and B Blue). 

In practical applications the data processing circuit 1 will be coupled 
via a buffer 9 and access logic 10 to a host processor (not shown). The configuration 
of Fig. 1 is preferably further provided with an instruction RAM 1 1 which is coupled 

20 via a bus 12 to the data processing circuit 1 as well as via a buffer 1 12 in which 

registers and drive means are incorporated. A clock means 13 provides the diverse 
components of the configuration with clock signals while a circuit 14 is included in 
the con- figuration for the video timing. A video input circuit 15 is preferably 
connected to the bus 6 for feeding video signals to the image memory 5. 

25 The structure of the data processing circuit is shown schematically in 

Fig. 2 and comprises a parallel multiplier 20 which comprises a RAM 21, an 
accumulator 22 and a Wallace tree 23. The data processing circuit also comprises a 
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data input and output circuit 24, a parallel shift register 25, a bus structure 26, a 
circuit 28 for unary operations, a circuit 29 for driving the image memory, a circuit 
30 for image input and output, an arithmetic logic unit 31, a circuit 32 for driving the 
register bank and a vector index generator, a register bank 33, a mask generator 34 
which comprises a transparent mask 35, an opaque mask 36, a window mask 37, a 
line mask 38, a polygon mask 39, a mask assembly means 40 and a range check 41, a 
circuit with phase-locked loop 42 and a circuit 43 for instruction processing which 
comprises a program control 44, start-up ROM 45 and an interrupt processing means 
46. 

2, a B-bus 53, a Q-bus 54, an F- bus 5S, an MJbtts*5STa U-bus 57, a D-bus 58 and a 
~-¥- bu3 59, laili uf which aiu, f or in^ufc732 b it s wide. • 

Tim rn.gvutP.r kum neHpH VIM Mlllp ul i ff ui ^^s^O arf fl 61 thfL . 

A and B bus respectively. Register bank 33 contains ninty-^i?rfnputs which are single 
32 bit, double 16 bit or quadruple 8 bit words. Thr^p^orts enable simultaneous 
performance of two read actions and a write aetfon. Sixty-two of the nenty-six 
registers are directly accessible. The repining thirty-two inputs are addressed via 
the vector index generator 32 whjen can generate a maximum of 12 locations per 
cycle (i.e., four byte section^ror each of the three ports, since each word segment can 
be selected separatelj^ymhin the registers). 

T ho parallel Qhift register 25 is d esigned such Ihftrtt can Ghift 32 bito 
of data anywhere from 1 to 32 positions to the left or righfm one clock cycle based 
on the information received via the A-bus 52. Th^mformation can be grouped into 
one, two or four sections of 32, 16 and 8>ffs respectively. The shift can take place 
logically (unsigned), numerically J^ned) and rotatingly. The operands are received 
from the B-bus 54 or the F-1WS5. The parallel shift register 25 is connected via a 
register 62 to the Q-bus \s4. Fig. 8 schematically shows an example of a two step 
rotation of a 32 bit word (consisting of two 16-bit bytes) through 1 1 bits in a positive 
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■eHrcefreft- b)' way uf f u ur-8 biHuiaiiom andxighl 4 - bil uussings : 

With . refer e nce to Fig. 3 , the arithmetic logic unit 3b(ALU) i s 

the A-bus S2, the Q-bus 54, the M-bus 56, the D/tfus 58, the U-bus 57, 
B-bus 53, the F-bus 55, again to the U-bus 57 and th^-bus 59. All the usual 
5 logic operations of a conventional ALU can be performed by the ALU of the present 

invention in addition to numerical functions sudi as addition, subtraction, increment 
— > and decrement. The ALU 3 1 is further m^vided with a so-called parametric logic 
function. On the basis of the content or an 8 bit register, the ALU 31 can perform a 
random combination of 256 possibfe logic operations on 3 operands^ The standards 
10 for X-window and MS-windows specify that logic and graphic operations must be 

possible in any combination: The parametric function can also be used to realize 
shifting, masking, combining or comparing operations in a single clock cycle. 

The ALU 3 1 can be adjusted as a single, double or quadruple parallel 
unit for 32, 16 and 8 bit operands respectively. The data coming from the A-, Q-, — 
15 or D-buses determines the selection of the size of the operands to be processed. A 

mode selector 63 is connected to the ALU 3 1 and generates a status signal on output 
64. The ALU 31 is further connected to the F-bus 55 via an output register 64. Fig. 4 
shows a functional diagram of the ALU for a parallel quadruple operation on 
operands of 24 bits, while Fig. 5 shows a functional diagram of a double operation 
20 with 48 bit operands. In Fig. 5, two selectors and two accumulators, each of 8 bits, 

are combined. 

The multiplier 23 is embodied as pipeline with five clock cycles. The 
multiplier is capable of performing pipeline operations on 32 bit, 16 bit and 8 bit 
words. All possible multiplication operations with numbers, signed and unsigned, or 
25 a combination thereof, in addition to execution of the multiplication of 16 bit 

complex numbers and 8 bit matrices with vectors is possible due, inter alia, to the 
presence of a Wallace tree (Fig. 7). The multiplier operates internally with 48 bit 
results or double 24 bit or quadruple 12 bit values, two of which are transported 
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simultaneously via 96 bit data channels. Fig. 6 shows a functional diagram of the 
multiplier with five clock levels. The multiplier is connected to the M-bus 56 via an 
output register 66. 

The circuit for unary operations 28 converts data, for instance, binary 
5 to unary (linear), indicates the position of the most significant bit, determines the 

absolute value of a sign and reverse the bit sequence of a word. Circuit 28 can 
operate on a word of 32, 16 or 8 bits. 

The mask generator 24 has a number of independent sub-units. The 
window mask 37 determines which regions the other operations must fall. The circuit 
10 41 for range checking operates on the basis of pre-defined patterns and, therefore 

one of its most important applications is generating letter characters. The circuit 41 
also serves to check three-dimensional pixel data, such as depth and color. 

The line mask 38 generates a horizontally defined pattern between a 
predetermined beginning and end. The line mask 38 can generate up to four lines 
15 simultaneously and supports, for instance, the creation of polygons. A shape along a 

horizontal line of the image can be produced using the line mask 38, when no 
interruptions occur along the line. 

The polygon mask 39 serves to generate elements for which the line 
generator is not . suitable, for instance, Chinese characters. The polygon mask 39 
20 defines the number of contour transitions on the horizontal lines passing through a 

relevant pixel. 

The mask assembly 40 performing the function of overlaying diverse 
masks. The results from the mask assembly 40 is transmitted to the respective 
transparent and/or opaque masks 35, 36 where the actual image for display is created. 
25 The transparent and opaque masks 35, 36 can both contain a maximum of 128 pixels 

in a matrix of 4 x 32. 

The circuit for data input and output 24 is connected to a 32 bit data 
channel and a 32 bit address bus. The range for addressing comprises 32 Mbyte. 
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The entry of instructions takes place under the control of the 
program control unit 44. With a 22 bit address, a following instruction word is 
continuously assigned which is subsequently entered via a separate 64 bit bus. The 
program memory can have a size of 4M x 64 bits. 

The drive of the image memory 29 is adapted to generate an address 
on the basis of an X/Y position so that any random image segment can be addressed 
on the basis of its location in the image and in the image memo- ry. The image 
memory is also suitable for storing other data banks such as lists and data banks with 
graphic elements. 

When a clock frequency of 66 MHZ isusgdJbf-a-da^^ 
circuit according to the present inventigiiHrtspossible to operate system such that 
4ks-aee ess lime for the ui^iBorylo 70 n3 r- 

Th fr d ata p r or frssing rinrui t ran 1 hf p r^gn wimH in n high^p^ nr^ 
anguage, such as C, so that it is easily programed, as in RISC and CISC^r6cessing 
units. The data processing circuit 1 can be programmed with instruptfons according 
to the RISC concept as well as with the CISC instructions ojXpersonal computer. 
In order to achieve a large increase in speed for graphic^fiplications, the programmer 
can program all functions of the data processing rifduit 1 at a lower level via an 
instruction field of 64-bits. The ALU 31 and^fctfe multiplier unit can be set to parallel 
operations, whereby the speed for graphkJapplications can be increased by a factor 
of 4-20 as compared to existing RISC processors. For a particular application, a 
programmer will set a "once-only" series of instructions and control registers. 
Subsequently, the programjrfer will start the processor with one command, "hereafter 
tire-proceyuor iiideperidently processed the pixerflowy.'' 

As example of the speed increase which can be gained by way of the 
present invention, algorithm consisting of five instructions for rotating and 
interpolating a color image is presented which can accommodate a total of 38 
instructions, that is: 



read 2 x 16 bit register; 

increment 2 x 16 bit register address; 

read 1 x 10 bit constant; 

shift 2 x 16 bit word; 

read 2 x 16 bit constant; 

add 2 x 16 bit value; 

read out 4 x 8 bit 2D memory data; 

read out 4 x 8 bit image memory data; 

increment 1 x 32 bit image memory address; 

multiply 4 x 8 bit value; 
read 4 x 12 bit accumulator register; 

accumulate 4 x 12 bit value write 4 x 12 bit accumulator register; and 
increment 2 x 5 bit register address accumulator. 

The data processing circuit according to the present invention can be 
built into specific equipment but can also be embodied as an extension card for a 
personal computer. Owing to the flexible utilization of the hard- ware, even at lower 
clock speeds than, for instance, 200 MHZ, which is currently among the highest, 
from 5 to 20 times improvement in image processing speed can be obtained. This 
makes the data processing circuit according to the present invention suitable for 
real-time video operations and so-called virtual reality. 

-Sifte e it is practically impossibl e- tu desciibe aU^tfic possibilities uf ' 
the present invention on account of its complexity, a jjKtfmct specification 
incorporated by reference, insofar as this is corpfrteted, is appended as an annex. As 
is usual in this technical field, this speci^kmion is written in the English language. 
After completion it will become^paffof the public domain, probably within a year. 
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WHAT IS CLAIMED IS : 

1 . A circuit for processing integer data for graphic image processing ^ 
applications, comprising: 

a multiplier unit having a pipeline for multiplying integer data, words of 8 bits 
or multiples thereof, the pipeline being adjustable to the length of t^fe integer data 
5 words to be multiplied; 

an arithmetic logic unit (ALU) for performing arithmjsfic operations on 
integer data words of 8 bits or multiples thereof, the word/length of the ALU being 
adjustable in accordance with the multiple of 8 bits constituting the integer and data 
words; 

10 a register unit provided with at least two^egisters for storage of the integer 

data words on which one of the operation and^ipeline multiplication has to be 
performed; and 

a bus structure for effecting the transport of integer data words from and to 
the multiplier unit, the arithmetic logicmnit and the register unit, the bus structure 
1 5 having a plurality of separate buses^each having a separate register connected thereto 

for transmitting and receiving theinteger data words. 



2. The circuit according to claim 1, wherein the pipeline is a five-step 
pipeline. 

3. The circuit/according to claim 1, wherein the integer data comprises one 
of 32 bit words, 16/bit words, and 8 bit words. 

4. A multiplier unit having a pipeline and a variable length accumulator, the 
multiplier having a word length which is adjustable and the multiplication is 
performednn accordance with the length of integer data words being multiplied, the 
length of the integer data words being 8 bits or a multiple thereof. 
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5. An arithmetic logic unit comprising a plurality of partitioned arithmetic 



logic units therein, the word length of the arithmetic logic unit being adjustable in 
accordance with the length of the integer data words being processed, the^ength of 
the integer data words being 8 bits or multiples thereof. 

6. A shift register unit having control logic capable of receiving an integer 
data word having a length which is variable in increments oyS bits, the shift register 
unit for shifting a 32 bit integer data word through a distance of 1 to 32 bits, in one of 
a left and a right direction and in one of a rotating ancka non-rotating manner. 

7. The circuit according to claim 1, in integrated form. 

8. The circuit as claimed in claim^f^fether comprising an instruction 
register, wherein the bus structure is provided with a plurality of registers and 
wherein the transport of the integendata words from and to the multiplier unit, the 
arithmetic logic unit and the register unit is programmable from the instruction 
register. 

9. The circuit according to claim 2, wherein the integer data comprises 32 bit 
or 16 bit words. 

10. The / circuit according to claim 2,wherein the integer data comprises 32 
bit or 16 bit words. 



bit 



A 1 . The circuit according to claim 3, wherein the integer data comprises 32 
16 bit words. 
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12. The circuit as claimed in claim 2, further comprising an instruction 

register, wherein the bus structure is provided with a plurality of registers and 

/ 

wherein the transport of the integer data words from and to the multiplier unit, the 



arithmetic logic unit and the register unit is programmable from the instruction 
register. 

13. The circuit as claimed in claim 3, further comprising an instruction 
register, wherein the bus structure is provided with a plurality of registers and 
wherein the transport of the integer data words from,and to the multiplier unit, the 
arithmetic logic unit and the register unit is programmable from the instruction 
register. 



14. The circuit as claimed in claim 7, further comprising an instruction 

^^^.^^^ 

wherein the transport of the integer data words from and to the multiplier unit, the 
arithmetic logic unit and the register unit is programmable from the instruction 
register. 



15. A circuit for processing digital data words, comprising: 

/ 

a multiplier unit for multiplying the digital data words, the multiplier unit 

having a pipeline in.Which the word length is adjustable to match the length of the 

/ 

digital data words£ the digital data having a length which varies incrementally; 
/ 

an arithmetic logic unit (ALU) capable of performing arithmetic operations 

on the digitaFciata words, the ALU being adjustable to match the length of the digital 

data words; 
/ 

a/register unit having at least two registers for storage of the digital data 
words /and 

/ a bus structure for transporting the digital data words from and to the 
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multiplier unit, the arithmetic logic unit and the register unit, the bus stmcturehaVing 
a plurality of separate buses each having a register connected thereto for transmitting 
and receiving the digital data words. 



16. The circuit according to claim 15, wherein the pipeline is a five-step 
pipeline. 

17. The circuit according to claim 15 y wherein the multiplier unit further 
comprises a Wallace tree. 



18. The circuit according to^cla'im 15, wherein the digital data words have a 
length which varies in increments of 8 bits. 



19. A multiplier unit for multiplying digital data words having a length 
which varies incrementally, the multiplier unit having a pipeline, the pipeline having 
a word length which/ii'adjustable to match the length of the digital data words. 



20. Ariarithmetic logic unit capable of performing arithmetic operations on 
digital data'words having a length which varies incrementally, the arithmetic logic 
unit beir'ig adjustable to match the length of the digital data words. 
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DATA PROCESSING CIRCUIT, MULTIPLIER UNIT^ITH 
PIPELINE, ALU AND SHIFT REGISTER UNIT FOR USE 
IN A DATA PROCESSING CIRCUIT' 




ABSTRACT OF THE DISCLOSURE 

The present invention provides axircuit for processing integer data, 

/ 

especially for graphic applications having a'multiplier unit which includes a pipeline 
in which the word length is adjustable for multiplying integer data s words of 8 bits 
or multiples thereof an arithmetic logic unit (ALU) for performing arithmetic 
operations on integer data words/the word length of which is adjustable in 8 bits or 
multiples thereof; a register unit provided with at least two registers for storage of 
integer data words having'multiples of 8 bits on which the operation and/or pipeline 
multiplication has toj^performed; and a bus structure having a number of separate 
buses which effects/the transport of integer data words from and to the multiplier 
unit, the arithmetic logic unit and the register unit. 
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